173
13.1
Transcription Code
The first code, transcription, determines when and how intensively a gene is read, in par
ticular on the basis of promoter sequences.
Transcription factor binding sites are encoded by short nucleotide sequences, several of
which act together to regulate readout in nucleated cells.
Some programs that can examine a promoter in more detail, such as TESS and
Genomatix, have already been introduced in Chap. 11. However, there are also other data
bases, such as TRANSFAC (https://www.gene-regulation.com/pub/databases.html),
MotifMap (https://motifmap.igb.uci.edu/) and JASPAR (https://jaspar.genereg.net/). Some
of these are publicly available for reading and searching transcription factor binding sites.
However, some have now become commercial and are no longer free to use.
But the closer one looks, the more unclear the transcription code is, in particular which
transcription factors that are still unidentified must also be taken into account, but also more
distant sequences that lead to increased (“enhancer”) or decreased transcription (“silencer”).
RNA Codes
However, the next step, the processing and splicing of precursor RNA, also follows its own
codes. Here, the splicing sequences that distinguish between intron and exon have already
been relatively well characterized. But it turns out that each organism has its own dialect
for deciding what to splice and how. A good program that is adaptive and species-specific
in predicting such sequences is the Augustus program (https://bioinf.uni-greifswald.de/
augustus/). It can be specially trained on new species and uses hidden Markov models for
prediction (Stanke et al. 2008).
However, one can look at numerous other codes in the RNA, in particular sequences
that decide whether the mRNA leaves the nucleus or not (in the case of mRNA in general
only one modified nucleotide, the 7-methylguanosine cap) and numerous other sequences
that regulate the translation, localisation as well as stability of the RNA (see first part; a
standard program to read these codes is the RNAAnalyzer: https://rnaanalyzer.bioapps.
biozentrum.uni-wuerzburg.de).
Protein Codes
Once the protein has been translated according to the genetic code, the question arises as
to whether it is modified post-translationally, i.e. whether sugar residues (e.g. aspartic acid
residues), lipids or acetate groups (e.g. lysine residues) are added to individual amino acids.
Next, based on its sequence, the protein folds into a molten globule (“molten globule”)
usually within milliseconds after its synthesis via the formation of a secondary structure,
and then (seconds) it arranges itself into its final three-dimensional structure. This com
plex 3-D code has not yet been “cracked” either. Neither are the biophysical codes known
in detail, nor do we have powerful enough computers to predict the structure accurately. In
13.1 The Different Languages and Codes in a Cell